Matching an approximately located query image against a reference image set

ABSTRACT

Aspects of the invention pertain to matching a selected image/photograph against a database of reference images having location information. The image of interest may include some location information itself, such as latitude/longitude coordinates and orientation. However, the location information provided by a user&#39;s device may be inaccurate or incomplete. The image of interest is provided to a front end server, which selects one or more cells to match the image against. Each cell may have multiple images and an index. One or more cell match servers compare the image against specific cells based on information provided by the front end server. An index storage server maintains index data for the cells and provides them to the cell match servers. If a match is found, the front end server identifies the correct location and orientation of the received image, and may correct errors in an estimated location of the user device.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 12/632,338, filed Dec. 7, 2009, the entire disclosure of whichis incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

Aspects of the invention relate generally to digital imagery. Moreparticularly, aspects are directed to matching a received image withgeolocation information against selected reference images.

2. Description of Related Art

Mobile user devices such as cellular telephones and personal digitalassistants (“PDAs”) often include digital cameras among other features.Such devices marry the benefits of wireless access with electronicphotography. A user may take pictures of friends and family, points ofinterest, etc., and share those pictures instantly.

Image recognition can be used on the pictures. For instance,applications such as mobile visual search programs may analyze thesepictures in an attempt to identify features such as points of interestand the like. However, mobile visual searching can be computationallyintensive as well as time consuming, and depending on the device thatcaptures the image, may rely on incomplete or inaccurate locationinformation associated with the image. Aspects of the invention addressthese and other problems.

SUMMARY OF THE INVENTION

In one embodiment, an image processing method is provided. The methodcomprises receiving an image request from a user device, the imagerequest including an image of interest and location metadata for theimage of interest; analyzing the location metadata to select one or morecells to evaluate against the image of interest, each cell having one ormore geolocated images and index data associated therewith; for eachselected cell, comparing the image of interest against the index data ofthat cell; identifying any matches from the geolocated images of theselected cells based on the compared index data; and providing thematches.

In one alternative, the matches are provided along with a matchconfidence indicator that identifies a likelihood or accuracy of eachmatch. Here, a value of the match confidence indicator desirably dependson geolocation verification between the location metadata and locationinformation for the geolocated images of the selected cells.

In another alternative, updated location metadata for the image ofinterest is provided to the user device along with the matches. In afurther alternative, the index data is stored in an index storageserver, and the index data for each selected cell is accessed with a keyrepresenting that cell's unique ID.

In yet another alternative, the index data corresponds to features ofthe geolocated images. In one example, the features are selected fromthe set consisting of corners, edges or lines, brightness informationand histogram information. In another example, the geolocated images arestored in an image database and the index data is stored in a celldatabase. And in a further example, the index data is stored in ak-dimensional tree format. And in one example, each cell has a unique IDderived from geolocation coordinates of that cell.

In another embodiment, an image processing apparatus is provided. Theapparatus comprises a front end module and a cell match module. Thefront end module is configured to receive an image request from a userdevice. The image request includes an image of interest and locationmetadata for the image of interest. The front end module is furtherconfigured to analyze the location metadata to select one or more cellsto evaluate against the image of interest. Each cell has one or moregeolocated images and index data associated therewith. The cell matchmodule is configured to compare the image of interest against the indexdata of the selected cells and to identify any matches from thegeolocated images of the selected cells based on the compared indexdata.

In one example, the cell match module comprises a plurality of cellmatch servers, and given ones of the cell match servers are assigned toperform the comparison for a corresponding one of the selected cells.Here, the matches may be provided along with a match confidenceindicator that identifies a likelihood or accuracy of each match.Alternatively, the apparatus further comprises an indexed moduleconfigured to store the index data of each cell. Here, each celldesirably has a unique ID associated therewith. In this case, each givencell match server accesses the index data of the corresponding cell fromthe indexed module using a key representing the unique ID of that cell.Preferably the unique ID for each cell is derived from geolocationcoordinates of that cell.

In a further alternative, the index data corresponds to features of thegeolocated images. And in this case, the features are desirably selectedfrom the set consisting of corners, edges or lines, brightnessinformation and histogram information.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an image of interest.

FIGS. 2A-B illustrate a mobile user device in accordance with aspects ofthe invention.

FIGS. 3A-B illustrate camera angle parameters.

FIG. 4 illustrates an image capture process.

FIG. 5 illustrates an image capture scenario.

FIG. 6 illustrates a computer system for use with aspects of theinvention.

FIG. 7 illustrates aspects of the computer system of FIG. 6.

FIGS. 8A-C illustrate cell arrangements in accordance with aspects ofthe invention.

FIG. 9 illustrates an image matching system in accordance with aspectsof the invention.

DETAILED DESCRIPTION

Aspects, features and advantages of the invention will be appreciatedwhen considered with reference to the following description of preferredembodiments and accompanying figures. The same reference numbers indifferent drawings may identify the same or similar elements.Furthermore, the following description is not limiting; the scope of theinvention is defined by the appended claims and equivalents.

As noted above, users of mobile devices may take pictures of people,places or things of interest. FIG. 1 is an exemplary image 100 which maybe captured by a mobile user device. An example of a street level imageis an image of geographic objects, people and/or objects that wascaptured by a camera at an angle generally perpendicular to the ground,or where the camera is positioned at or near ground level. Both thegeographic objects in the image and the camera have a geographiclocation relative to one another. Thus, as shown in FIG. 1, the streetlevel image 100 may represent various geographic objects such asbuildings 102 and 104, a sidewalk 106, street 108, vehicle 110 andpeople 112. It will be understood that while street level image 100 onlyshows a few objects for ease of explanation, a typical street levelimage will contain as many objects associable with geographic locations(street lights, signs and advertisements, mountains, trees, sculptures,bodies of water, storefronts, etc.) in as much detail as may be capturedby an imaging device such as a digital camera.

In addition to being associated with geographic locations, images suchas street level image 100 may be associated with information indicatingthe orientation of the image. For example, if the street level imagecomprises a typical photograph, the orientation may simply be the cameraangle such as an angle that is 30° east of true north and rises 2° fromground level. If the street level images are panoramic images, such as360° panoramas centered at the geographic location associated with theimage, the orientation may indicate the portion of the image thatcorresponds with looking due north from the camera position at an angledirectly parallel to the ground.

FIGS. 2A-B illustrate a mobile user device 200 that is configured tocapture images. As shown in FIG. 2A, the mobile user device 200 may be aPDA or cellular telephone having a touch-screen display 202,general-purpose button 204, speaker 206, and microphone 208 on thefront. The left side includes volume button(s) 210. The top sideincludes an antenna 212 and GPS receiver 214. As shown in FIG. 2B, theback includes a camera 216. The camera may be oriented in a particulardirection (hereafter, “camera angle”). And as shown in the front panelof FIG. 2A, a zooming button or other actuator 218 may be used to zoomin and out of an image on the display.

The camera may be any device capable of capturing images of objects,such as digital still cameras, digital video cameras and image sensors(by way of example, CCD, CMOS or other). Images may be stored inconventional formats, such as JPEG or MPEG. The images may be storedlocally in a memory of the device 200, such as in RAM or on a flashcard. Alternatively, the images may be captured and uploaded into aremote database.

The camera angle may be expressed in three-dimensions as shown by the X,Y and Z axes in FIG. 2B and schematically in FIGS. 3A and 3B. It shallbe assumed for ease of understanding and not limitation that the cameraangle is fixed relative to the orientation of the device. In thatregard, FIG. 3A illustrates a potential pitch of the device (as seenlooking towards the left side of the device) relative to the ground,e.g., relative to the plane perpendicular to the direction of gravity.

FIG. 3B illustrates a potential latitude/longitude angle of the device(as seen looking down towards the top side of the device), e.g., thecamera direction in which the camera points relative to the latitude andlongitude. Collectively, the pitch and latitude/longitude angle define acamera pose or location and orientation. The roll (rotation about the Yaxis of FIG. 2B), yaw/azimuth and/or altitude may also be captured. Thisand other image-related information may be outputted as numerical valuesby an accelerometer (not shown) or other component in the device 200,used by the device's processor, and stored in the memory of the device.

In one aspect, a user may position the client device 200 with the camera216 facing an object of interest. In that regard, as shown in FIG. 4,the user may stand in front of an object of interest, such as a buildingor monument, and orient the camera 216 in a direction 220 that pointstoward a spot 222 on the point of interest.

The camera 216 of the client device 200 may be used to help the userorient the device to the desired position on the object of interest,here building 102. In this regard, the display 202 may also display atarget, bull's-eye or some other indicator to indicate the exact orapproximate position of the object at which the device 200 is pointed.

Once an image is captured, the user may elect to share the image withothers. Or, alternatively, the user may look for more information aboutan object in the image. A visual search application may be employed toidentify information about the image. Then, relevant informationconcerning the image may be provided to the user. In a case where theimage is sent to others or stored in an external database, the relevantinformation about the image may also be stored or indexed with theimage. However, a primary issue is the proper analysis andclassification of the image.

One aspect provides a system and method to match an image with somelocation information against a database of previously geolocatedreference images. As will be explained in detail below, the database ofreference images may be split into geographic cells. The received imageis matched against a subset of those cells.

When a user takes a picture of an object of interest such as a building(e.g., a storefront) using his or her mobile device, it is desirable toquickly identify information about that building. In the example of FIG.5, the camera on the mobile user device 200 takes a picture of thebuilding 102.

The GPS unit of the device 200 may provide a rough location of where thepicture was taken. However, the device's GPS sensor may not be accurateenough to disambiguate at the individual building level. In addition,the device may not always record or provide an orientation/direction,which may be needed to determine which direction the device's camera ispointing. And even if the orientation/direction is provided, it may notbe very accurate. Thus, in the example of FIG. 5, it is possible for thewrong building to be identified (e.g., building 104), or for abackground building to be identified instead of a person or a foregroundobject, or vice versa. In order to overcome such problems, one aspect ofthe invention matches the photograph against a database of referenceimages.

A system comprising image and/or map databases may be employed. As shownin FIG. 6, system 300 presents a schematic diagram depicting variouscomputing devices that can be used alone or in a networked configurationin accordance with aspects of the invention. For example, this figureillustrates a computer network having a plurality of computers 302, 304and 306 as well as other types of devices such as mobile user devicessuch as a laptop/palmtop 308, mobile phone 310 and a PDA 312. The mobileuser devices may include the components discussed above with regard tomobile user device 200. Various devices may be interconnected via alocal bus or direct connection 314 and/or may be coupled via acommunications network 316 such as a LAN, WAN, the Internet, etc. andwhich may be wired or wireless.

Each computer device may include, for example, user inputs such as akeyboard 318 and mouse 320 and/or various other types of input devicessuch as pen-inputs, joysticks, buttons, touch screens, etc., as well asa display 322, which could include, for instance, a CRT, LCD, plasmascreen monitor, TV, projector, etc. Each computer 302, 304, 306 and 308may be a personal computer, server, etc. By way of example only,computer 306 may be a personal computer while computers 302 and 304 maybe servers. Databases such as image database 324 and map database 326may be accessible to one or more of the servers or other devices.

As shown in diagram 400 of FIG. 7, the devices contain a processor 402,memory/storage 404 and other components typically present in a computer.Memory 404 stores information accessible by processor 402, includinginstructions 406 that may be executed by the processor 402. It alsoincludes data 408 that may be retrieved, manipulated or stored by theprocessor. The memory may be of any type capable of storing informationaccessible by the processor, such as a hard-drive, memory card, ROM,RAM, DVD, CD-ROM, Blu-Ray™ Disc, write-capable, and read-only memories.The processor 402 may be any well-known processor, such as processorsfrom Intel Corporation or Advanced Micro Devices. Alternatively, theprocessor may be a dedicated controller such as an ASIC.

The instructions 406 may be any set of instructions to be executeddirectly (such as machine code) or indirectly (such as scripts) by theprocessor. In that regard, the terms “instructions,” “steps” and“programs” may be used interchangeably herein. The instructions may bestored in object code format for direct processing by the processor, orin any other computer language including scripts or collections ofindependent source code modules that are interpreted on demand orcompiled in advance. For example, instructions 406 may include imageprocessing programs for analyzing received imagery. Functions, methodsand routines of the instructions are explained in more detail below.

Data 408 may be retrieved, stored or modified by processor 402 inaccordance with the instructions 406. For instance, although systems andmethods according to aspects of the invention are not limited by anyparticular data structure, the data may be stored in computer registers,in a relational database as a table having a plurality of differentfields and records, XML documents, or flat files. The data may also beformatted in any computer-readable format. By further way of exampleonly, image data may be stored as bitmaps comprised of pixels that arestored in compressed or uncompressed, or lossless or lossy formats(e.g., JPEG), vector-based formats (e.g., SVG) or computer instructionsfor drawing graphics. The data may comprise any information sufficientto identify the relevant information, such as numbers, descriptive text,proprietary codes, pointers, references to data stored in other memories(including other network locations) or information that is used by afunction to calculate the relevant data.

Although FIG. 7 functionally illustrates the processor and memory asbeing within the same block, it will be understood by those of ordinaryskill in the art that the processor and memory may actually comprisemultiple processors and memories that may or may not be stored withinthe same physical housing. For example, some of the instructions anddata may be stored on a removable CD-ROM or DVD-ROM and others within aread-only computer chip. Some or all of the instructions and data may bestored in a location physically remote from, yet still accessible by,the processor. Similarly, the processor may actually comprise acollection of processors which may or may not operate in parallel.

In one aspect, computer 302 is a server communicating with one or moremobile user devices 308, 310 or 312 and a database such as imagedatabase 324 or map database 326. For example, computer 302 may be a webserver or application server. Each mobile user device may be configuredsimilarly to the server 302, with a processor, memory and instructions.Each mobile user device may also include a wireless transceiver (e.g.,cellular telephone transceiver, Bluetooth, 802.11-type modem or WiFi).As shown in FIG. 7, the database(s) desirably stores images 416,including, for example, the location and orientation (if known) of eachimage, and/or maps or cells 418. The maps/cells may each have an indexand a unique ID.

In addition to having a processor, memory, a display and the like, themobile user devices 308, 310 and 312 desirably also include the camera200, GPS receiver 214, an accelerometer 410 and, a transceiver 412 forcommunication with a network or individual remote devices. A browser 414or other user interface platform may work in conjunction with thedisplay and user input(s).

In accordance with one aspect of the invention, in order to determinewhether an object of interest is a place such as a building, the pictureis matched against a database of images from the approximate geographicregion where the picture was taken. To keep the matching tractable, anylocation information (e.g., GPS coordinates) received from the mobileuser device that is associated with the image may be used. Thus, thedevice's GPS coordinates may be used as a rough guide to pick anappropriate set of imagery from the database. Then, that imagery can bematched against the image from the user's device.

Once the received image is matched to a known image, a more refinedlocation can be associated with the received image. Or, alternatively,the location and orientation of the mobile user device can be corrected.This may be done by solving for the relative pose or relative locationand orientation of the received image based on correspondences withimage information from the database. Alternatively, the known positionand orientation of the reference image(s) may be used directly. Thisinformation may be updated at the device itself, may be maintained inthe network (e.g., by server 302), or both. Additionally, if there is astrong match against a building or other point of interest, then it islikely that the user is interested in that point of interest.

The image processing may be split into two parts. One is index building.The other is matching against a pre-build index. FIGS. 8A-C illustrateone way to perform index building. First, as shown in FIG. 8A, a region450 may include a geographic cell 452. One or more images is associatedwith the cell 452. As used herein, a “cell” includes a delimitedgeographic area at some point on the Earth. Cells may be of varyingsizes. For instance, cells may be on the order of 10s to 100s of meterson each side. Depending upon the amount of information available, theregion 450 may be split into smaller cells. Thus, as shown in FIG. 8B,it may have four cells 454. Or as shown in FIG. 8C, it may have sixteencells 456. In one example, there may be dozens or hundreds of imagesassociated with a given cell.

The imagery of each cell has certain features. For instance, each imagemay be associated with location information such as latitude/longitude,orientation and height. The image also includes image details. Theimages details may include corners, edges or lines, brightness changes,histograms or other image filtering outputs from known image processingtechniques. Some or all of these features may be extracted from theimages and stored in an index for the given cell. The database(s) maystore the imagery itself in a known image format such as JPEG. The indexand cell information may be stored in any convenient format. Althoughthe invention is not limited by any particular data structure, the datamay be stored in computer registers, in a relational database as a tablehaving a plurality of different fields and records, XML documents orflat files such as keyhole flat files. The indexed features aredesirably stored in a form that allows fast comparison with queryfeatures, such as a k-dimensional tree (kd-tree).

Each cell preferably also has a unique ID associated with it. Forinstance, the unique ID may be derived from the coordinates of the cell(e.g., the center latitude/longitude of the cell). A received image maybe quickly matched against a given index. By way of example only, theindices may be written to a key-value store database, where the key isthe cell's unique ID. Here, the value is the created index for the givencell.

The database may also take into account the directed that the referenceimage(s) is facing. Discrete compass directions may be used. Here, aseparate index may be created for each direction.

FIG. 9 illustrates a system 500 for performing image matching onceindices have been built. The system 500 comprises modules for handlingdifferent aspects of the image matching. The system 500 preferablyincludes a first module, shown as comprising a front end server 502. Acell match module may comprise one or more cell match servers 504. Andan indexed module comprises an index storage server, 506. Each of theseservers may be configured as described above with the server 302 shownin FIGS. 6 and 7. While not shown, the index storage server 506 may becoupled to image database 324 and map/cell database 326. While theservers are shown as discrete devices, it is possible to employ a singlemachine having multiple subprocessors operating as the differentservers.

The front end server 502 receives an image/match request, for instancefrom an application or interface on the mobile user device. The requestincludes an image and corresponding metadata about the image'sgeographical location and orientation (if available). The front endserver 502 uses the image's received location information, plus anestimate of any possible error in the location information, to determinea small subset of cells to match the image against.

The image matching is conducted by one or more cell match servers 504.Matching against cells can occur in parallel using many cell matchservers 504. Each cell match server 504 is provided the received imageand the key of a cell that it should match the received image against. Agiven cell match server 504 will then query one or more index storageservers 506 to access the index data for the given cell.

Each cell match server 504 matches the received image against itsrespective index data. One or more matching references (if any) arereturned to the front end server 502. These results preferably include amatch confidence indicator. In addition, the cell match server 504 maydetermine and return an improved/corrected position and orientation forthe received image.

The cell match server 504 may use the mobile user device's locationand/or orientation sensors to perform geolocation verification on anymatches. If a match result indicates a location and orientation that isvery different than that reported by the device's sensor(s), then thematch confidence assigned to that result may be lowered accordingly.

The index storage server(s) 506 receive the key/unique ID and return anyassociated data. As the index may be very large (e.g., hundreds orthousands of gigabytes of data), different subsets of data may be storedon different computers or in different datacenters. The correct datasubset or partition (a shard) for a given index key may be determinedusing a hashing scheme.

The front end server 502 is configured to collate results returned bythe cell match servers 504. The front end server may threshold the matchscores provided by the cell match servers. The result(s) with thehighest correlation and/or confidence is (are) identified as a(possible) match.

As discussed above, the results may be used to provide correctedlocation information to the mobile user device. They may also be used toprovide enhanced content to the device. For instance, information may beprovided about the point of interesting in the image. Or supplementalcontent regarding nearby buildings and attractions may be given, such asvia a local listing or Yellow Pages application. The results may also beused in an augmented reality application.

Although aspects of the invention herein have been described withreference to particular embodiments, it is to be understood that theseembodiments are merely illustrative of the principles and applicationsof the present invention. It is therefore to be understood that numerousmodifications may be made to the illustrative embodiments and that otherarrangements may be devised without departing from the spirit and scopeof the invention as defined by the appended claims.

1. An image processing method, comprising: receiving an image requestfrom a user device, the image request including an image of interest andlocation metadata for the image of interest; analyzing the locationmetadata to select one or more cells to evaluate against the image ofinterest, each cell having one or more geolocated images and index dataassociated therewith; for each selected cell, comparing the image ofinterest against the index data of that cell; identifying any matchesfrom the geolocated images of the selected cells based on the comparedindex data; and providing the matches.
 2. The image processing method ofclaim 1, wherein the matches are provided along with a match confidenceindicator that identifies a likelihood or accuracy of each match.
 3. Theimage processing method of claim 1, wherein updated location metadatafor the image of interest is provided to the user device along with thematches.
 4. The image processing method of claim 1, wherein the indexdata is stored in an index storage server, and the index data for eachselected cell is accessed with a key representing that cell's unique ID.5. The image processing method of claim 1, wherein the index datacorresponds to features of the geolocated images.
 6. The imageprocessing method of claim 5, wherein the features are selected from theset consisting of corners, edges or lines, brightness information andhistogram information.
 7. The image processing method of claim 5,wherein the geolocated images are stored in an image database and theindex data is stored in a cell database.
 8. The image processing methodof claim 5, wherein the index data is stored in a k-dimensional treeformat.
 9. The image processing method of claim 1, wherein each cell hasa unique ID derived from geolocation coordinates of that cell.
 10. Animage processing apparatus, comprising: a front end module configured toreceive an image request from a user device, the image request includingan image of interest and location metadata for the image of interest,the front end module being further configured to analyze the locationmetadata to select one or more cells to evaluate against the image ofinterest, each cell having one or more geolocated images and index dataassociated therewith; and a cell match module configured to compare theimage of interest against the index data of the selected cells and toidentify any matches from the geolocated images of the selected cellsbased on the compared index data.
 11. The image processing apparatus ofclaim 10, wherein the index data corresponds to features of thegeolocated images.
 12. The image processing apparatus of claim 11,wherein the features are selected from the set consisting of corners,edges or lines, brightness information and histogram information.
 13. Atangible memory on which computer-readable instructions of a computerprogram are stored, the instructions, when executed by a processor,cause the processor to perform a method, the method comprising:receiving an image request from a user device, the image requestincluding an image of interest and location metadata for the image ofinterest; analyzing the location metadata to select one or more cells toevaluate against the image of interest, each cell having one or moregeolocated images and index data associated therewith; for each selectedcell, comparing the image of interest against the index data of thatcell; identifying any matches from the geolocated images of the selectedcells based on the compared index data; and providing the matches. 14.The memory of claim 13, wherein the method provides the matches alongwith a match confidence indicator that identifies a likelihood oraccuracy of each match.